Training fails to elicit subtle reasoning in current language models
I think Sonnet 3.5 monitor/Sonnet 3.7 attacker was a narrow intelligence gap. The paper itself says “as developers continue to scale RL reasoning compute, models may become more capable of subtle reasoning.”I see this as a minor empirical result with unclear generalisability
I think Sonnet 3.5 monitor/Sonnet 3.7 attacker was a narrow intelligence gap.
The paper itself says “as developers continue to scale RL reasoning compute, models may become more capable of subtle reasoning.”
I see this as a minor empirical result with unclear generalisability